Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching
نویسندگان
چکیده
We develop a formal perspective on how regular expression matching works in Java1, a popular representative of the category of regex-directed matching engines. In particular, we define an automata model which captures all the aspects needed to study such matching engines in a formal way. Based on this, we propose two types of static analysis, which take a regular expression and tell whether there exists a family of strings which makes Java-style matching run in exponential time.
منابع مشابه
On the Semantics of Atomic Subgroups in Practical Regular Expressions
Most regular expression matching engines have operators and features to enhance the succinctness of classical regular expressions, such as interval quantifiers and regular lookahead. In addition, matching engines in for example Perl, Java, Ruby and .NET, also provide operators, such as atomic operators, that constrain the backtracking behavior of the engine. The most common use is to prevent ne...
متن کاملEfficient Submatch Extraction for Practical Regular Expressions
Internal Posting Date: March 6, 2012 [Fulltext] Efficient Submatch Extraction for Practical Regular Expressions Stuart Haber, William Horne, Pratyusa Manadhata, Miranda Mowbray, Prasad Rao HP Laboratories HPL-2012-41R1 regular expressions; submatch extraction; capturing groups A capturing group is a syntax used in modern regular expression implementations to specify a subexpression of a regul...
متن کاملSemantics, analysis and security of backtracking regular expression matchers
Regular expressions are ubiquitous in computer science. Originally defined by Kleene in 1956, they have become a staple of the computer science undergraduate curriculum. Practical applications of regular expressions are numerous, ranging from compiler construction through smart text editors to network intrusion detection systems. Despite having been vigorously studied and formalized in many way...
متن کاملThe Formal Semantics of Rascal Light
Rascal [4] is a programming language that aims to simplify software language engineering tasks like defining syntax, analyzing and transforming programs, and generating code. The language provides many high-level features including native support for collections (lists, sets, maps), algebraic data-types, powerful pattern matching operations with backtracking, and high-level traversals supportin...
متن کاملStatic Analysis for Regular Expression Exponential Runtime via Substructural Logics
Regular expression matching using backtracking can have exponential runtime, leading to an algorithmic complexity attack known as REDoS in the systems security literature. In this paper, we present a static analysis that detects whether a given regular expression can have exponential runtime for some inputs. The analysis works by forming powers and products of transition relations and thereby r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014